Heterogeneous face recognition (HFR) aims to match facial images acquiredfrom different sensing modalities with mission-critical applications inforensics, security and commercial sectors. However, HFR is a much morechallenging problem than traditional face recognition because of largeintra-class variations of heterogeneous face images and limited trainingsamples of cross-modality face image pairs. This paper proposes a novelapproach namely Wasserstein CNN (convolutional neural networks, or WCNN forshort) to learn invariant features between near-infrared and visual face images(i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained withwidely available face images in visual spectrum. The high-level layer isdivided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer.The first two layers aims to learn modality-specific features and NIR-VISshared layer is designed to learn modality-invariant feature subspace.Wasserstein distance is introduced into NIR-VIS shared layer to measure thedissimilarity between heterogeneous feature distributions. So W-CNN learningaims to achieve the minimization of Wasserstein distance between NIRdistribution and VIS distribution for invariant deep feature representation ofheterogeneous face images. To avoid the over-fitting problem on small-scaleheterogeneous face data, a correlation prior is introduced on thefully-connected layers of WCNN network to reduce parameter space. This prior isimplemented by a low-rank constraint in an end-to-end network. The jointformulation leads to an alternating minimization for deep featurerepresentation at training stage and an efficient computation for heterogeneousdata at testing stage. Extensive experiments on three challenging NIR-VIS facerecognition databases demonstrate the significant superiority of WassersteinCNN over state-of-the-art methods.
展开▼